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Abstract. Let Xi, . . . , T; : X — > X be commuting measure-preserving trans- 
formations on a probability space (X, X,fi). We show that the multiple er- 
godic averages i y^.^Zq fi{T"x) . . . fi(T"x) are convergent in L 2 (X, X,fi) as 
TV — ► oo for all fx, ...,/; £ L°°(X, X,li)\ this was previously established for 
I = 2 by Conze and Lesigne [3] and for general £ assuming some additional 
ergodicity hypotheses on the maps T; and TiTJ' by Frantzikinakis and Kra 
4 (with the I = 3 case of this result established earlier in 1301 ). Our approach 
is combinatorial and finitary in nature, inspired by recent developments re- 
garding the hypergraph regularity and removal lemmas, although we will not 
need the full strength of those lemmas. In particular, the I = 2 case of our 
arguments are a finitary analogue of those in [3] . 



1. Introduction 



The purpose of this paper is to establish the following norm convergence result for 
multiple commuting transformations. 

Theorem 1.1 (Norm convergence). Let I > 1 be an integer. Assume that T±, . . . , T/ : 
X — > X are commuting invertible measure-preserving transformations of a measure 
space (X, X, fj,). Then for any fx, - ■ ■ , fi G L°°(X, X, /i), the averages 

N-l 

-Y / fi(T{ a x)---fi(T l n x) 
are convergent in L 2 (X,X,l\l). 

Remark 1.2. By using Holder's inequality and a limiting argument, one can relax 

the L°° conditions on fi to L Pi conditions for certain finite exponents p.;. For 

similar reasons, one can also replace the L 2 norm with other L p norms, provided 

that - > — + ... + —. We omit the standard details, 
p — pi pi 

The case I = 1 is essentially the mean ergodic theorem. The case I — 2 is due to 
Conze and Lesigne [3j. This result had been obtained by Zhang [30] for I = 3 and 
Frantzikinakis and Kra 0] for general I under the additional hypotheses that each 
of the Tj and the T{IJ (for i ^ j) are individually ergodic transformations. The 
result was also obtained by Lesigne [TB] for certain distal systems and by Berend 
and Bergelson [5] for certain weakly mixing systems. In the important special 
case Ti = T % for some measure-preserving transformation T : X — > X, this result 
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was first obtained for general I by Host and Kra p~2] (with a different proof given 
subsequently by Ziegler [3Pj). 

All of the preceding arguments mentioned above approach the norm convergence 
problem through the techniques of ergodic theory, for instance by constructing char- 
acteristic factors for the above system. Here we shall adopt a somewhat different- 
looking approach, which is based on running the Furstenberg correspondence prin- 
ciple in reverse to deduce the above ergodic theory result from a purely combi- 
natorial result (much as the Furstenberg recurrence theorem ||)J can be deduced 
from Szemeredi's theorem [22]). More precisely, we shall deduce Theorem 1 1 . 1 1 from 
the following "finitary" version, in which the general measure-preserving system 
(X, X , fj,,Ti, . .. ,Ti) has been replaced by the finite abelian group Z l P — (Z/PZ) 1 
for some large integer P, with the discrete cr-algebra, the uniform probability mea- 
sure, and the standard I commuting shifts TiX := x + e,. 

Definition 1.3 (Expectation notation). For any finite set B and any function 
/:£?—> R, we use \B\ to denote the cardinality of B, and define the average 
^xeBf(x) := pjj J2xeB f( x )- ^ n particular, if N is a positive integer, we use [N] 
to denote the discrete interval [N] := {0,1,..., iV— 1}, and thus E n€ rjv]/(ri) = 

Definition 1.4 (Finitary averages). Let I > 1 and P > 1. We let ei, . . . , e/ be the 
standard generators of the finite additive group Z l p . For any functions /i, . . . , fi : 
Zp — > R and any N > 1, we define the multiple average A^ifi, ■ ■ ■ , fi) ■ Z l P — + R 
by the formula 

l 

AN(fi,---,fi)(a) := E„ e[Ar] JJ/i(a+ net). 

i=l 

Example 1.5. If I — 2 and /i, f% : Zp — => R, then 

x N-l 

A N (fl,f2)(vi,V2) = — ^2 f 1 ^ 1 + n , v 2)f2(vi,v 2 +n) 
for all v\ , V2 G Zp . 

We let N := {1, 2,3,.. .} denote the positive natural numbers. 

Theorem 1.6 (Finitary norm convergence). Let I > 1 be an integer, let F : N — > N 
be a function, and let e > 0. Then there exists an integer M* > with the following 
property: If P > 1 and fi,---,fi ■ Z l p — * [—1,1] are arbitrary functions on Z l p , 
then there exists an integer 1 < M < M* such that we have the "L 2 metastability" 

(1) \\A N (f 1 ,...J l )-A N ,(f 1 ,...,f l )\\ LHzlp) <e 

for all M < N, N' < F(M), where we give Z l P the uniform probability measure. 

Remark 1.7. For applications, Theorem 11.61 is only of interest in the regime where 
F(M) is much larger than M, and P is extremely large compared to I, F, or e. The 
key points are that the function F is arbitrary (thus one has arbitrarily high quality 
regions of I? metastability) , and that the upper bound M* on M is independent of 
P. The / = 1 version of this theorem was essentially established (with Z l P replaced 
by an arbitrary measure-preserving system) in PQ . 
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Remark 1.8. The presence of the arbitrary function F : N — > N may appear 
strange, but this is in fact a natural consequence of the "quantifier elimination" 
necessarj0 in order to finitise a convergence result. For instance, if /i, /b, . . . are a 
sequence in a normed vector space V, observe that the statement 

/i, /2, . . . are a Cauchy sequence in V 

is by definition equivalent to the assertion that for every e > there exists M > 1 
such that 

||/iv-/^'lk<efor all N, N' > M, 
and that this in turn is equivalent to the assertion that for every e > and every 
F : N -> N, there exists M > 1 such that 

(2) \\f N - f N ,\\ v < e for all M <N,N>< F(M). 

Philosophically, the statement © looks easier to prove because (once one fixes 
the function F) one is only asking for the sequence /jv to be metastable rather 
than stable - i.e. stable on a finite range [M, F(M)\ rather than an infinite range 
[M, +oo). This allows us to perform pigeonholing tricks based on locating several 
disjoint intervals of the form [M, F(M)], as was recently carried out in [26]. Indeed 
our arguments here have some of the "multiscale analysis" flavour of [26j • See also 
[23j . in which functions such as F play a key role in establishing a hypergraph 
regularity lemma. 

We shall establish Theorem 11.61 by "finitary ergodic theory" techniques, reminis- 
cent of those used in [9] to establish arbitrarily long arithmetic progressions in the 
primes. For instance, instead of building infinitary characteristic factors as was 
done in earlier work on this problem, we shall build finitary characteristic factors 
out of "anti- uniform functions" analogous to those in [5]. This allows us to es- 
sentially reduce Theorem 11.11 to the case in which all the functions /i, • ■ • , // are 
anti-uniform functions (which will in turn be polynomial combinations of basic anti- 
uniform functions) . The anti- uniformity allows one to reduce the complexity of the 
average, and very roughly speaking allows one to deduce the ^-dimensional conver- 
gence result in Theorem. il .6l from an I — 1-dimensional convergence resultQ. However, 
for technical reasons, we will not induct on Theorem 11.61 directly, but on a more 
complicated counterpart (see Theorem 14.11 below) . and induct on a "complexity" d 
rather than a "dimension" I, 

Interestingly, the theory of nilsystems (or spectral theory, or Fourier analysis) does 
not play any role in our arguments (in sharp contrast to [H] or [21]), although the 
cubes and Gowers-type norms which appear in [12] have a faint presence here via 
our machinery of anti-uniform functions. Similarly, the full strength of tools such 
as the hypergraph regularity lemma are not needed; instead we need the weaker 

4n proof theory, this finitisation is known as the Godel functional interpretation of the infini- 
tary statement, which is also closely related to the Kriesel no- counterexample interpretation |14l . 
1 151 or Herbrand normal form of such statements; see | 13| for further discussion. We thank Ulrich 
Kohlenbach for pointing out this connection. 

2 This is analogous to how the argument in [3] deduced the I = 2 case of Theorem 11.11 from 
various one-dimensional convergence results such as the mean and Birkhoff ergodic theorems. 
Indeed our own proof of the 1 = 2 case of Theorem 11.11 was inspired (albeit somewhat indirectly) 
by the arguments in [3]. 



4 



TERENCE TAO 



"Koopman-von Neumann" counterparts to such regularity lemmas (analogous to 
the "weak regularity lemma" of Frieze and Kannan [5j ) . As with other applications 
of graph and hypergraph methods, the Z' group action in fact plays remarkably lit- 
tle role in these arguments, although the standard fact that this group is amenabl^ 
will be implicitly used at several crucial junctures (basically allowing us to treat 
coarse scales averages as an average of fine scale averages, modulo negligible er- 
ror^. 

The main advantage of working in the Unitary setting, as opposed to the more 
traditional infinitary one, is that the underlying dynamical system becomes ex- 
tremely explicit, being simply the standard shifts on Z P . In particular we have a 
Cartesian product structure which allows us to construct various product sets^ in 
our dynamical system, without having to pay attention to technical issues such as 
measurability. This product structure will be crucial to our arguments (it basically 
endows our system with the structure of a hypergraph). It seems of interest to try 
to obtain similar product structures in the traditional infinitary setting; the argu- 
ment in [3] achieves this to some extent in the I = 2 case. (See also [35] for another 
(not entirely satisfactory) attempt to endow dynamical systems with hypergraph 
structure.) This would likely lead to a more traditional infinitary proof of Theorem 

o 

Finally, we remark that our methods do not seem to extend to the significantly more 
difficult question of pointwise almost everywhere convergence of these averages; for 
that task, some sort of multilinear maximal inequality may be needed. 

1.9. Organisation. This paper is organised as follows. Firstly, in Section^ we 
use the Furstenberg correspondence principle in the reverse direction to deduce 
the infinitary convergence theorem, Theorem 11.11 from its finitary counterpart, 
Theorem ll.6l Then, in Section [3l we set out our basic notation we need to establish 
Theorem ll.6l In Section 2J we deduce Theorem 11.61 from a more technical variant, 
Theorem 14.11 which is in a form suitable for applying mathematical induction on 
a certain "complexity" parameter d. The base case d = 1 (which is essentially a 
finitary analogue of the mean ergodic theorem, as in [1]) is then handled in Section 
[5] these arguments are then generalised to handle the inductive case d > 1 in 
Section El 

At several points in the argument it will be convenient to pass from a "probabilistic" 
norm convergence result to a "deterministic" one. The natural tool for this is the 
Lebesgue dominated convergence theorem, but as we are working in a finitary 



For instance, one can establish analogues of our results in which Z is replaced with the infinite 
vector space -F N over a finite field F generated by an infinite basis ei,e2,.-., and the intervals 
[N] are replaced with the subspaces spanned by ex,---, ejv- In fact the proof in this finite field 
case is somewhat easier than in the integer case due to the perfect nesting of the scales. 

4 For a specific example of this, if T is a shift operator and Sn are the averaging operators 
SjV := E n ^[ N ]T n , observe for 1 < M < N that SmSn arl d Sn differ in L 2 operator norm by 
only 0(M/N), and thus we have the heuristic SmSn ~ Sat in the regime N 2> M. 

Actually, as is usual in the hypergraph approach to recurrence problems, we shall lift Z l p to 
Z^" 1 in order to abstract away the arithmetic aspects of the shift operations; see Section|4]below. 
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setting, we of course need a finitary counterpart of this infinitary theorem. We 
discuss such a counterpart in an appendix to this paper. Actually it is possible to 
skip this dominated convergence step and work entirely in a probabilistic setting 
throughout, but this causes the notation to be slightly more complicated. 

1.10. Acknowledgements. We thank Ciprian Demeter for explaining the argu- 
ment in [3J, for encouragement, and for bringing the norm convergence problem 
to our attention. We thank Henry Towsner and Ulrich Kohlenbach for bringing 
the author's attention to [1] and to pointing out the connections to proof theory. 
We also thank Jennifer Chayes for suggesting the term "metastability" , and Tim 
Austin, Ciprian Demeter, Henry Towsner, and Christoph Thiele for helpful dis- 
cussions. Finally, we thank the anonymous referee for a careful reading of the 
manuscript and for many suggestions and corrections. The author is supported by 
NSF grant CCF-0649473 and a grant from the MacArthur Foundation. 

2. The reverse Furstenberg correspondence principle 

In this section we show how to reverse the Furstenberg correspondence principle [6] 
to deduce Theorem 11.11 from Theorem 1 1.61 

Proof of Theorem \1.1\ assuming Theorem \1.6\ Observe that the / commuting trans- 
formations generate a measure-preserving action of Z l on the system (X, X, [i). We 
claim that we may reduc^ to the case when this action is ergodic, i.e. the only 
sets which are invariant under all of the T\, . . . , TJ have either zero measure or full 
measure. Note that this is a much weaker property than requiring that each of the 
Ti, . . . ,Ti (or the TiT^ 1 ) are individually ergodic. This reduction is standard and 
performed for instance in [3J page 157], so we only sketch it here. Using the ergodic 
decomposition (see e.g. [7]) one can disintegrate fi as an integral of measures fi y , 
such that each \x y is invariant and ergodic with respect to the Z ; action. By hypoth- 
esis, the averages E„ e wi/i(T"a;) . . . fi(Tpx) are convergent, hence Cauchy, in each 
of the L 2 (X, X , n y ); they are also bounded between —1 and 1. By the dominated 
convergence theorem we conclude that these averages are Cauchy, hence convergent, 
in L 2 (X, X,[i), as desired. 

Henceforth we assume the Z l action to be ergodic on (X, X, /i). Applying the 
Birkhoff pointwise ergodic theorem for 7} (see [28]), and in particular we see that 
for any / e L°°(X, X, jj.) that 



(3) 




for almost every Xg, where we adopt the convention 



Vl,...,Vl) ,_ rpVl rpVi 



^Actually, this reduction step, as well as the step involving the generic point xq below, is not 
strictly necessary to our argument, provided that one is willing to replace Theorem 11.61 by the 
more complicated- looking generalisation in Theorem 14. 1 1 below. 
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Let us say that a point xo is generic if |3|) holds for all functions / which are 
polynomial combinations of the /i, . . . , /; with rational coefficients. Since there are 
only countably many such functions, we see that almost every point is generic. 

Fix a generic point xq. Recall that our objective is to show that the sequence of 
functions 



j=l 

is convergent in L 2 (X,X,fi). It of course suffices to show that it is a Cauchy 
sequence. If this is not the case, then there exists e > with the property that for 
every integer M > 0, there exists an integer F(M) > M such that 

2 



(4) 



v 



E 



ne[F(M)] 



Y[MT?x)-E nf :mllfi<??z) 



dfj,(x) > 3e 2 



(say). Fix this e and F, Applying ((3J), we can write the left-hand side of @ as 



lim E 

P-»oo 



v£[P]> 



E ne[F(A/)] JJ fi{T?T v x ) - E„ e[M ] JJ fi(T™T v x ) 



Let M* be the integer depending on l,e,F which appears in Theorem 11.61 Then, 
if P is sufficiently large depending on M *, _F, f±, xq, e, we can ensure that 



(5) 



E 



relP}' 



E 



nG[F(M)] 



2^0 J 



•''0 , 



»=i 



>2e^ 



for all 1< Af < M* 



We now assume P large enough so that the above properties hold. Define the 
functions gi, . . . ,gi : Z l p — > [—1,1] by setting 

9l (v) := /i(T^o) 

for all v G Zp, where we artificially identify Zp with [P] in the usual manner. From 
||SJ) we see that 



E 



ve[P 



[py \A F{M ){gi, ■ ■ -,9i)(v) - A M {gi, ■ ■ -,gi)(v)\ > e 2 



for all 1 < M < M* , if P is large enough depending on M * , P, e (this is necessary to 
be able to neglect the (rare) "wraparound effects" caused when the shifts T™, . . . , TJ 1 
push one of the coefficients of a beyond P). But this contradicts Theorem ll.61 This 
contradiction establishes Theorem 11.11 as desired. □ 



Remark 2.1. It is also possible to apply the Furstenberg correspondence principle 
(as in [B] or [7j) in the more standard direction and deduce Theorem 11.61 from 
Theorem ll.il by using the weak sequential compactness of probability measures on 
the discrete cube {0, 1} Z . We leave the details to the interested reader. 



It remains to prove Theorem ll.61 This will be the purpose of later sections. 
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3. FlNITARY NOTATION 



Theorem 11.61 is a statement in "finitary" mathematics - it concerns averages over 
finite sets. In this section we lay out some finitary notation which will be of use 
in establishing that theorem (and also point out some connections with graph and 
hypergraph theory which are implicitly lurking just beneath the surface). We will 
of course be heavily using the expectation notation in Definition [L3l We also recall 
some standard asymptotic notation: 

Definition 3.1 (Asymptotic notation). We use A <C B or B ^ A to denote the 
bound A < CB for some constant C, and O(A) to denote any quantity bounded in 
magnitude by CA. If we wish to allow the constant C to depend on auxiliary pa- 
rameters, we will denote this by subscripts, e.g. O n (A) denotes a quantity bounded 
by C V A where C v is allowed to depend on r\. 

3.2. Factors. Next, we recall the notion of a factor from ergodic theory. 

Definition 3.3 (Factor). Let (X, X, /i) be a probability space. A factor of (X, X, /i) 
is a triplet y = (Y,y,ir), where Y is a set, y is a cr-algebra, and ir : X — > Y is 
a measurable map. If J 7 is a factor, we let T5y := {tt^ 1 (E) : E G y} be the sub- 
a- algebra of X formed by pulling back y by ir. A function / : X — » R is said to 
be y -measurable if it is measurable with respect to By. If / £ L 2 (X,X,[i), we 
let E(/|y) = E(/|By) be the orthogonal projection of / to the closed subspace 
L 2 (X, By,/i) of L 2 {X, X, ^) consisting of ^-measurable functions. If y = (Y, y, n) 
and Y = (Y f , Y, vr') are two factors, we let yvy 1 denote the factor {Y xY',y® 



Remark 3.4. The concept of a factor in ergodic theory corresponds closely with the 
concept of a partition or colouring in graph or hypergraph theory. 

Our probability spaces shall usually be finite sets with the uniform distribution. 
More precisely, if Y is a finite set, let 2 Y — {E : E C Y} be the discrete cr-algebra 
on Y, and let /iy be the uniform probability measure on Y. In particular, L 2 (Y) 
be the finite-dimensional real Hilbert space of functions / : Y — > R, endowed with 
the inner product 



More generally, if X = (X,X,[i) is another probability space, L 2 (Y x X) is the 
real Hilbert space of measurable functions / : Y x X — > R, endowed with the inner 
product 



Remark 3.5. Our use of the uniform distribution for Y corresponds to the customary 
convention in graph and hypergraph theory to give all vertices, edges, etc. equal 
weight. One can of course replace uniform distributions by more general probability 
distributions, corresponding to weighted graphs and hypergraphs, but we will not 
need to do so here. 



y,7r®7r'). 
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In the infinitary theory, we can use any measurable function / : X — > R to generate 
a factor of X, whose cr-algebra is generated by the level sets / -1 ([a, b}) for all a, b. 
The function / will then be measurable with respect to that factor. Such factors 
turn out to be far too large for us to use in the fmitary setting (for instance, if X is 
finite and / takes different values at each point of X, then the above cr-algebra will 
be the maximal cr-algebra 2 X ). Instead, we will need some slightly coarser factors, 
defined via the following lemma. 

Lemma 3.6 (Each function generates its own factor). Let (X, X, /j) be a probability 
space, let I C R be a compact interval, and let ip : X — > / be a measurable function. 
Then for any r)o > there exists a factor y no (ip) with the following properties. 

(i) (ip lies in its own factor) For any factor y' , we have 

\\<p - v y)\\L-(x,x*) < vo- 

(ii) (Bounded complexity) The a-algebra By is generated by Oj )7 j (1) atoms. 

(iii) (Approximation by polynomials of ip) If A is any atom in By and r)i > 0, 
there exists a polynomial ^ a ■ I —> [0,1] of degree Oi_ no ^ n (l) and coeffi- 
cients 0^,7/0 ,171(1) such that 

Ua - ^a(v)\\l 1 (X,X,ij,) < m 

and 

\\lA - i &A(<P)\\L<*>(X,X,ii) < !• 

Proof. This lemma essentially appears in 9, Proposition 7.2], [531 Proposition 6.1], 
or [27l Proposition 7.3], so we give only a brief sketch here. 

We use the probabilistic method. Let a 6 [0, 1] be chosen uniformly at random. 
We let y(<p) = y a {if) be the factor 

y{tp) = (I,B am ,(p) 

where B Q ^ is the cr-algebra generated by the intervals [(n + a + 1)770, {n + a)r)o). 
The properties (i), (ii) are then obvious, so it suffices to verify (iii). Firstly, we 
observe that it suffices to verify (iii) in the case where 771 = 2~- 7 for an integer 
j > 0. We will in fact show that for each fixed j, that (iii) holds with probability 
1 ~ Oi,no (2 _J )j from the union bound we thus see that there exists a choice of a 
for which (iii) holds for all j that are sufficiently large depending on /, 770, and the 
claim then follows since the claim for small j clearly follows from that of large j. 

Let us now fix j. By (ii) and the union bound again, it suffices to verify the claim 
for a single atom A = ( ( 9 _1 ([(n + a + l)r/o, (n + a)r)o)), where n G Z is fixed. We 
define the exceptional set 

B := {x G X : \<p(x) — (n — a — i)t]q\ < 2~ 2j for some i — 0, 1} 

then from Fubini's theorem we see that B has small measure on the average: 

Efx(B) < T 2 K 

By Markov's inequality, we thus see that 11(B) < 2~- 7 /2 with probability 1 — 0(2^^). 
We now apply Urysohn's lemma followed by the Weierstrass approximation theorem 
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to locate a polynomial ^a : I —> [0, 1] of degree Oi^ ,j{X) and coefficients Oi tVo j(l) 
such that 

I*a(*) - l[(n+ Q +l) I)0 ,(n+Q)7 70 )( t )l < 2~ J / 2 

for all t with |i — (n — a — z)?yo| > 2 -2 "' for i = 0, 1. (Note that a ranges in a compact 
set, and so the bounds on the degree and coefficients on ^>a are uniform in a.) One 
then easily verifies that 

||U - *A(<p)\\mx,x llt ) < 2" J '/2 + KB) < 2-i 
and the claim (iii) follows. □ 

Henceforth we fix the assignment (tp, tjq) i— » 3-V)(v) of a factor to each function tp 
and an error tolerance 770 as defined above. 

3.7. Products, edge factors, and complexity. We shall work frequently with 
finite Cartesian products 

(6) Yj = Y[Yi := {(yi) ie i ■ Vi for all % € /} 

iei 

where I is a finite index set, and the Yi are also finite sets. We of course adopt the 
usual convention that 

Y n ■= '[I Y 

ie{l,...,n} 

for any non-negative integer n. 

For technical reasons (basically due to our use of probabilistic methods) we will 
also need to deal with the slightly larger product spaces 

(7) Y I xX = l\Y l xX 

iei 

where X = (X, X,fi) is another probability space (possibly infinite). The space X 
should be thought of as a "passive" space, as the parameters in X will simply be 
averaged over at the end of the day, with no non-trivial interaction with any other 
parameters in the argument. The space Yj x X is then also a probablity space, 
with the product tr-algebra 2 Yl ® X and the product measure /iy, x /i. Of course 
one can view ordinary Cartesian products ((6|) as a special case of j7|) in which the 
probability space X is just a point, X = pt. 

Remark 3.8. In the graph and hypergraph theory language, the sets Yi should be 
viewed as disjoint classes of vertices, and various subsets of 1} should be interpreted 
as partite graphs or hypergraphs, where the edges consist of up to one vertex from 
each of the classes Yi . Subsets of the larger space Yi x X should be interpreted as 
random partite graphs or hypergraphs. 

Now we come to a crucial concept in our product space analysis. 

Definition 3.9 (Edge factors). Let F z X X = (Yj X X,2 Yl ® X,^ Yl x fi) be a 
probability space as above. For any e C I, let Y e := Yiite an d let ir e : Yj x X — > 
Y e x X be the edge projection 

n e ((.Vi)iei,x) := {{yi)iee,x). 
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We then let y e be the factor (Y e x A, 2 Y " x X, x /x) of Yi X X. We say that a 
function / : Yi x X — > R is e -measurable if it is 34-measurable. 

Example 3.10. Let 7 be a finite set, let X = (X, A", /i) be a probability space, and 
let / : Y 3 x X — > R be a measurable function. Then / is {1, 3}-measurable if and 
only if it takes the form 

f{yi,V2,ya,x) = fia(yi,V3,x) 

for some measurable /13 : Y^ 13 ^ x X — > R. Similarly, / is {3}-measurable if and 
only if it takes the form 

f(yi,y2,y3,x) = h{ya,x) 

for some measurable / 3 : Y^ x X — > R. 

Remark 3.11. In the graph and hypergraph theory language, an e-measurable set 
should be regarded as an |e|-uniform, e-partite hypergraph on the vertex classes Yi 
for i G e. For instance, continuing the above example, we let 3^1,3^2,^3 be three 
identical copies of Y, viewed as vertex sets, then if an indicator function 1# 13 is 
{1, 3}-measurable then it can be viewed as describing a bipartite graph connecting 
Y\ and 3^, whereas if an indicator function Ie 3 is measurable it can be viewed as 
describing a set of vertices in Y 3 . Finally, a {1, 2, 3}-measurable indicator 1e 123 can 
be viewed as a 3- uniform tripartite hypergraph connecting Y%, Yi, and I3 . 

We make the trivial remark that an e-measurable function is automatically e'- 
measurable for any e' D e. For instance, all functions are /-measurable. 

Let d > 1 be an integer. We will informally refer to an edge factor 34 as having 
complexity d if |e| — d. We would like to combine together all the edge factors 34 of 
a given complexity d to create a "complexity d factor" , which should morally form 
a tower of factors in d analogous to the Furstenberg tower constructed for instance 
in [7]. However, one has to take some care with this, since as c-algebras (or even as 
algebras), the edge factors 34 of complexity d generate the entire cr-algebra 2 Yl <g> X. 
To obtain a meaningful concept of a "complexity d factor" , then, we have to also 
limit the complexity of the polynomial combinations of e-complexity functions we 
shall employ. This leads to the following important definitions. 

Definition 3.12 (Complexity). Let Fj x X = (Y> x X,2 Y ' <8> X^i Yl x fj) be a 
probability space as above. Let 1 < d < A function g : Yl iEl Yi x X — > [—1,1] 
is a primitive function of complexity at most d if it takes values in [— 1 , 1] and is 
e-measurable for some e C I with |e| < d. A function g : Yl ieI JjXl-» [—1, 1] 
is a basic function of complexity at most d if it is the product of finitely many 
primitive functions of complexity at most d, or equivalently if it has a representation 
g = Y[ e ci-\e\=d 9& wri ere each g e is e-measurable. A function g : Yi x X — > R 
is an elementary function of complexity at most (d, J) for some integer J > 1 if it 
can be expressed as the sum of at most J basic functions of complexity at most d. 

Example 3.13. We continue Example 13.101 If j \ 2 , J "13, #12,523 : Y 2 x X — > [—1,1] 
are measurable functions, then the function 



f(yi, J/2, 2/3, a:) := /12 (2/1, 2/2, z) 
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is a primitive function of complexity at most 2, 

f'(Vi,V2,y3,x) ■= fi2(yi,y2,x)f 13 (y 1 ,y 3 ,x) 
is a basic function of complexity at most 2, and 

/" (yi > 2/2 , 2/3, := fa (Vi , 2/2 , x) fa (yi , 2/3 , x) + g \ 2 (2/1 , 2/2 , x)g 23 (2/2 , 2/3 , x) 
is an elementary function of complexity at most (2, 2). 

Remark 3.14. Observe that if <? and g' are elementary functions of complexities at 
most {d, J) and (d, J') respectively, then gig' and gg' have complexities at most 
(d, J + J') and (d,JJ') respectively; also, if a is any real number with \a\ < L 
for some integer L, then ag has complexity at most (d,JL). Thus the space of 
functions of bounded complexity is morally an algebra. 

3.15. Group structure. Graph and hypergraph theory takes place on vertex sets 
Y which have no algebraic structure. However, in our application these sets arise 
from Z and will have two additional structures: the additive group structure, and 
the F0lner-type structure coming from the sets [N] that one is averaging over. To 
handle these structures we introduce two useful notations. 

Definition 3.16 (Summation). Let G — (G,+) be an additive group, and let G 1 
be any finite Cartesian power of G. Given any vector v — (wi)ie/ G G 1 , we define 
the sum T,(v) £ G of v by S(«) := v i- 

Clearly, X is a homomorphism from G 1 to G. We shall usually apply this notation 
with G = Zp equal to a cyclic group. 

To motivate our next definition, we recall the setup in Example 11.51 We rewrite 
AnUii h){vi,v 2 ) as 

An(Ji, f2)(v!,v 2 ) = E„ e[J v]/i(-«2 - {-vi-v 2 -n),v 2 )f2(vi, -v x - (-vx-v 2 -n)). 

The point of doing this is that we now see that the /i factor depends only on v 2 
and —vi — v 2 — n, while the f 2 factor depends only on v% and —v\ — v 2 — n. To 
make these dependencies even clearer, we introduce the {2, 3}-measurable function 

9{2,3}(vi,v 2) v 3 ) := fi{-v 2 - v 3 ,v 2 ) 
and the {1, 3}-measurable function 

9{i,3}(vi,v 2 ,v 3 ) := f 2 (vi, -vi - v 3 ) 
and observe the identity 

(8) A N (fi,f 2 )(vi,V 2 ) ='E,n£lN}9{2,3}9{l,3}(vi,V2,-Vl -v 2 -n). 

Thus A^ifi, f 2 ) can be viewed as an average of the product of the {2, 3}-measurable 
function 3{2,3} a- n d the {1, 3}-measurable function g{\, 3 } along the diagonal region 
{{vi,V2,Va) : v 3 G -vi -v 2 - [N]}. 

More generally, we can represent averages such as A n as diagonally averaged pro- 
jections by introducing the following operator. 
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Definition 3.17 (Diagonally averaged projection). Let I > 1 and P > 1 be inte- 
gers. Let (X, X, n) be a probability space. If / : Z l + 1 x X -> R is a measurable 
function and iV > 1 is an integer, we define the diagonally averaged projection 
A^v/ : Z P x X — > R to be the function 

Ajv/(«, x) := E ne[ jv]/((i;, -X)(u) - n), x) 

for all v € Zp and a; € X . 

Remark 3.18. The space X is necessary to our argument for technical inductive 
reasons but should be neglected at a first reading. 

The projection A at is related to the averages A jv in Definition 11.41 by the easily 
verified identity 

l 

(9) A N {f 1 ,...,f l )=A N ([[g {1 ^i +iyw} ) 

i=l 

for any /i, • • • , /j : Zp — ► R, where for each 1 < i < i, the function flTn, j+m/rt : 
Z^ 1 — > R is the {1, ...,/+ l}\{i}-measurable function 

5{l,...,i + l}\{i}( t; l I ■ ■ • : = fi(vi, ■ ■ -,Vi-l, - ^ V j: V i+l: ■ ■ ■!«/)■ 

1<j<J+1:jV< 

One can verify that when 1 = 2, that © collapses to (J5J. 

Remark 3.19. The above elementary arithmetic manipulations are essentially the 
same manipulations used in the hypergraph approach (see [3T], [T5], [TT], [53]) to 
Szemeredi's theorem [2 2) or the Furstenberg-Katznelson theorem [8], in order to 
rewrite the problem in a "hypergraph" form, by which we mean that the problem 
now concerns the averages of products of multiple functions, each of which depends 
on a different set of variables. (This corresponds to the problem in hypergraph 
theory of counting the number of instances of a small fixed hypergraph inside a 
much larger hypergraph.) 

The operator Ajv is clearly linear. For future reference we also observe the module 
identity 

(10) AaKs{i,...,j}/i) = g { i,...,i } A N {h) 

for any {1, . . . , Z}-measurable g{i j\ : Z^ 1 xX-»R and any h : Z l p 1 xI^R, 
where by abuse of notation we also view the {1, . . . , immeasurable function tfri .n 
as a function on Zp. 

4. A GENERALISATION OF THEOREM 11.11 

We will prove Theorem 11.61 by an induction on the "complexity" of the functions 
/ involved. As it turns out, a naive induction based on Theorem 11.61 in its current 
form does not seem to work well, and so we shall instead use the following more 
complicated generalisation of Theorem ll.Gl to induct upon, in which functions such 
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as f\ , . . . , fi are allowed to be "random" rather than "deterministic" (or more pre- 
cisely, they are allowed to depend on an additional probability space {X, X,/i)), 
and have varying levels of "complexity" . 

Specifically, we shall deduce Theorem 1 1 . 61 from the following more technical variant. 

Theorem 4.1 (Finitary norm convergence, technical generalisation). Let 1 < d < 
I, > I, and J > 1 be integers. Let F : N — ► N be a function, and let e > 0. 
Then there exists an integer M* > M* with the following property: Lf P > X, if 
(X, X, fi) is a probability space, and g : Z^ -1 x X — > R is an elementary function 
of complexity at most (d, J), then there exists an integer M* < M < M* such that 

(11) \\&N(g) ~ A N ,(g)\\ L2{zlpXX) <e 

for all M <N,N' < F(M). 

Remark 4.2. This theorem is faintly reminiscent of the "hypergraph counting lem- 
mas" which appear for instance in [17], [TT], [23] . 

The deduction of Theorem 14.11 from Theorem 11.61 is immediate by specialising to 
the case where d = I and ill* = J = 1, where X is a point, and g is the function 
rii=i <?{i....,z+i}\{i}i which is a basic function of complexity d, and then using ([9]). 

Remark 4.3. The main point of generalising Theorem 11.61 to Theorem 14.11 is that 
it introduces a new parameter d - the maximum complexity of all the functions g e 
involved. We shall in fact prove Theorem 14.11 by an induction on this parameter 
d (keeping the dimension I fixed). The addition of the probability space {X, X,fi) 
is a technical convenience for us, as it allows us to perform a number of averaging 
or probabilistic arguments without losing the ability to exploit the induction hy- 
pothesis. The passage from one level of complexity d to the next d + 1 is roughly 
analogous to that of passing from one dynamical system to a primitive extension 
in ergodic theory. 

It remains to prove Theorem 14. 11 This will be the purpose of the later sections. 

5. The base case 

In this section we shall establish the base caseQ d = 1 of Theorem 14.11 

We first make some simple reductions. Firstly, we can reduce to the case M* = 1, 
by replacing F(M) by the function F(M) := F(max(M, M»)), applying Theorem 
14.11 with F (and M* replaced by 1), and then replacing M with max(M, M*). 

Next, we reduce to the case J — 1 by the following argument. Since g : Tip 1 xl^ 
R has complexity at most (d, J), we can write g — gx + • • • + gj where each 

'In fact, one could incorporate this case into the inductive case, by making d = the base 
case instead, but we have chosen to do the d = 1 case in detail for didactic reasons, as it serves to 
motivate the higher d argument. 
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gk ■ T^p 1 X X — ► R is a basic function of complexity at most d. We then de- 
fine the extended probability space X := X x {1,...,J}, where we give {1, . . . , J} 
the discrete c-algebra and uniform probability measure, and give X the associ- 
ated product measure. We also define the function g : Z^ 1 x X — > [—1, 1] by 
g(v,(x,k)) := gk(v,x). One easily verifies from Definition 13.121 that g is a basic 
function of complexity at most d, and that we have the identity 

\\A N (g) - A N ,(g)\\ L2{z i pXX) = J^ 2 \\A N (g) - A N ,{g)\\ L2(zlpXj() 

for all N, N' . Because of this, we see that we can reduce to the J — 1 case (after 
adjusting e by a factor of J 1 / 2 ). 

Since J — 1 and d = 1, we can now write g = n*ti wnere eacn <7{i} : Zp'xl — > 
R is {z}-measurable and takes values in [—1, 1]. The contributions of the factors g^ 
with 1 < i < I can be quickly discarded by using the module identity (fTU]) . Because 
of this, we may assume without loss of generality that I consists only of the singleton 
set {I + 1}, thus we now just have a single function g{i + i} : Tip 1 x X — > [0,1]. We 
can use the {I + l}-measurability to write 

9{l+i} (vi , ■ ■ • , vi+i , x) = g(-vi + i, x) 
where g : Zp x X — > [—1,1] is a measurable function. We now observe from 
Definition 13.171 that the function Ajv(g , {;+i})((i'i, ■ ■ ■ , «z), only depends on «i + 
. . . + Vi and x. Thus we may quotient out by the hyperplane {(ui, . . . , «/, f;+i) 6 
Tip 1 : v\ + . . . + vi = 0} and reduce Zp 1 " 1 to a one-dimensional group Zp. We are 
now reduced to showing the following: 

Theorem 5.1 (Finitary norm convergence, base case). Let F : N — » N be a 

function, and let e > 0. Then there exists an integer M* > 1 wit/i £/ie following 
property: If P > 1, z/ (X, ^f, /i) is a probability space, and j:Zj>xX^[0,1] is a 
measurable function, then there exists an integer 1 < M < M * suc/i £/ia£ 

(12) ||SWff - SV'SlU^ZpXX) < £ 

for all M < N,N' < F(M), where Sn is the averaging operator Sn9(v>%) := 
~Ene[N]9( v + n i x ), an d similarly for Sn>. 

In fact, it suffices to show this theorem in the case when X is a point: 

Theorem 5.2 (Finitary norm convergence, simpler base case). Let F : N — > N &e 

a function, and let e > 0. T/iera i/iere exists an integer M* > 1 with the following 
property: If P > 1, and 5 : Zp — > [0, 1], f/ien i/iere exists 1 < M < M* suc/i i/wz£ 

(13) II^Arfl 1 - S'iV'fl , ||z, 2 (Zp) < e 

/or a^/ M < N,N' < F(M), where Sn is the averaging operator Swgiv) := 
Ene[JV]fl , ( u + an d similarly for Sn 1 ■ 

Indeed, Theorem 15.11 can be immediately deduced from Theorem 15.21 by applying 
the finitary Lebesgue dominated convergence theorem, Theorem IA.21 using the 
functions 

fN.N'(x) := \\S N g{-,x) - S N >g(-,x)\\ 2 L 2 {Zp) G [0, 1]. 
Remark 5.3. Theorem 15.21 is nothing more than the I — 1 case of Theorem 11.61 



CONVERGENCE OF MULTIPLE ERGODIC AVERAGES 



15 



It remains to prove Theorem l5.2l We will not give the shortest proof of this theorem 
her^l, but will instead give a more pedestrian argument which will motivate the 
proof of the inductive case d > 1 of Theorem 14. II in the next section. 

A crucial notion to our argument is that of an basic anti-uniform functio?^. 

Definition 5.4 (Basic {l}-anti-uniform function). Let M > 1. A basic {l}-anti- 
uniform function of scale M is any function ip : Zp — > R of the form 

<p(v) := E ne[M] b(v - n) 

for some function b : Zp — > [—1,1]. 

Note that any basic {l}-anti- uniform function will itself take values between — 1 
and 1. Furthermore, one easily verifies the Lipschitz property 

(14) \ (p ( v + n )- ip ( v )\< 2 L± 

for all n £ Z and v € Zp, and all basic {1}- anti-uniform functions p of scale M. 
Heuristically, basic {l}-anti- uniform functions should be viewed as essentially being 
constant at scales below M. The relevance of basic {l}-anti- uniform functions to 
Theorem 15.21 relies on the following simple lemma. 

Lemma 5.5 (Lack of uniformity implies correlation with basic anti- uniform func- 
tion). Let g : Tip — > [—1,1], M > 1, and e > be such that 

(15) \\S N g\\mz P ) > £ 

for some N > ^j^< Then there exists a basic {1}- anti-uniform function <p of scale 
M such that \{g,ip) L *(Zp)\ > e 2 / 2 - 

Proof. We expand (fT5|) as 

E veZp (E ne[N] g(v + n))(E n , e[N] g{v + n')) > e 2 . 
Observe from the hypothesis N > that 

\~E n >e[N]g(v + n) - E n > e [ N ]E me [ M ]g(v + n + m)\ < — 
for all v G Zp, and thus by the triangle inequality 

|E t , e zp(E„ e[Ar ]3(w + n))E n , G[N ]E me[M] g(v + n + m)\ > e 2 /2. 
By the pigeonhole principle, we can thus find n, n' G [N] such that 

\E v&Zp g(v + n)E me[M] g(v + n' + m)\ > e 2 /2. 
We can rewrite this as (g, <p)l 2 (z p ) I > £2 /2, where 

p(v) := E me[M] b(v + to) 
and b(v) := g(v +n' — n), and the claim follows. □ 

^Indeed, one can use the Furstenberg correspondence principle to deduce Theorem 15.21 from 
the mean ergodic theorem. See also [l] for a direct proof of this theorem. 
^Our terminology is inspired by that in [9]. 
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To exploit this lemma, we need to use the basic {l}-anti- uniform functions to build 
various factors (the finitary analogue of characteristic factors), using the construc- 
tion in Lemma 13.61 

We turn to the details. Let K > 1 be the first integer larger than + 2, and 
F : N — > N be a function to be chosen later (it shall depend on F and e), such 
that F(M) > M for all M. Define the sequence 

1 < Mi < M 2 < ... < M K 

recursively by M\ :— 1 and M i+1 :— F(Mi). 

By greedily iterating Lemma 15.51 at a rapidly diminising sequence of scales we shall 
obtain a useful decomposition g = g v ± + gu where gjj± is "low complexity" and gu 
is "negligible" at scales between Mk-i and for some fc, in the following precise 
sense. 

Proposition 5.6 (Koopman-von Neumann type theorem). Let g : Zp — > [0,1]. 

Then we can decompose g = g v i_ + gu, where the two components gu±,gu : Zp — > 
[—1, 1] have the following properties. 

(i) (du 1 - anti-uniform) There exists an integer 2 < fc < K and a basic {1}- 
anti-uniform function ipj of scale Mj for each k < j < K such that gjj± 
is y>k-measurable, where y>k := 3 7 e 2 /4oo('i 5 fe) V ... V y e 2 /iaoifK), and the 
factors Y £ 2 / A00 (tpj) are those defined in Lemma \3.b\ 

(ii) (gu uniform) We have 



for allN> 1000 ^-\ 

Remark 5.7. See Proposition 8.1], [24, Theorem 3.9], [13 Theorem 4.7], or [TUl 
Theorem 6.7] for similar results. 

Proof. We perform the following algorithm: 

• Step 0. Initialise k = K + 1. 

• Step 1. Set y> k := 34 2 /40o(^fc) V ... V y e */mo{VK), and then set g v ± := 
TZ(g\y>k) and gu ■— g — gu- L - (Thus, initially, gu± is simply the mean value 
Evez P g(v) of g.) 

• Step 2. If (HU) holds for all N > 1000 ^-^ then STOP. Otherwise, we 
apply Lcmma l5.5l to locate a basic {l}-anti-uniform function ipk-i of scale 
Mk-i such that \{gu, fk-i) l^(z p )\ > e 2 /200. 

• Step 3. We decrement k to k — 1. If k — 1 then we STOP with an error; 
otherwise we return to Step 1. 

If this algorithm terminates at some k > 2 then we are done, so suppose instead 
for contradiction that the algorithm goes all the way down to k = 1. Then we have 
constructed tpi , . . . , tpx such that 



(16) 



|SVffE/|U 2 (z P ) < e/10 



(g-E(g\y> ]+1 ),^) 



L 2 (Zp) 



> e 2 /200 
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for all 1 < j < K . On the other hand, by Lemma l3~6f i) we have 

H^-Efel^-)]]^ <£ 2 /400 

and hence by the triangle inequality (and the fact that g takes values in [0, 1]) we 
have 

(g - v(g\y> j+ x), e(^ \y>j)) LHZp) I > e 2 /400. 

We can rewrite the left-hand side as 

(E(9\y>,) - E(fl|3^ + i),E(^|3^)> ia( zp) 
and thus by Cauchy-Schwarz 

\\E(g\y>j) -E( 5 |y> i+1 )|| i2(Zp) > e 2 /400 
and thus by Pythagoras' theorem 

l|E(ff|3^-)lli 3( zp) ^ l|E( 5 |y> J+1 )|l 2 L2(Zp) + ^ 

for all 1 < j < K. On the other hand, the quantities ||E(g|3^>j)||^2(z p ) clearly 
range between and 1. These facts contradict the definition of K. The claim 
follows. □ 

We apply this proposition to obtain 2 < k < K , basic {l}-anti-uniform functions 
tpk, ■ ■ ■ , fK, and a decomposition g = g v ± + gu with the stated properties. 

Let M be the first integer greater than 1000 ^-i ; an d i e t M < N,N' < F[M). 
To prove Theorem 15.21 it will suffice to show that, for F chosen appropriately 
depending on F and e, 

\\S N g - Sn'9\\l^(z p ) < £, 

since M will be bounded by some quantity M* depending on e and F, and thus 
ultimately on F and e. From (|16p we already have 

\\Sn9u\\l'(Zp), \\Sn'9u\\l^(Zp) < e/10 
so by the triangle inequality it will suffice to show that 

(17) \\S N gu± - S N 'guA\L^(z P ) < e/10. 

Now the function gjj± takes values between and 1, and is measurable with 
respect to the factor y>k- From Lemma 13.61 this factor has (9r-. £ (1) = O e (l) 
atoms, each of which is the intersection of atoms coming from the individual fac- 
tors y e 2 /4oo(Vfc)7 ■ ■ ■ 7 y e 2 /40o(^-R")- Applying Lemma 1331 repeatedly, we thus see for 
every 771 > there exists a polynomial if? : R, K ~ k+1 R of if - fc + 1 variables 
with degree and coefficients OK,e,ni(^) = O ej7ll (l) such that 

\\ gu ± - ®((p k , (Pk)Wh(Zp) <e m 

and 

\\g v ± - $>(ipk, ¥>K )|U»(z P ) <£ I- 
By Holder's inequality we conclude that 

\\gu± - V(cpk, ■ • ■ , <Pk)\\l*(z p ) <e W, 
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since Sn is a contraction on L 2 , we conclude that 

\\Sn9u± - S N y{ip k , <Pk)\\ 2 l*( Zp ) <e Vi; 

Thus, if we choose 771 sufficiently small depending on e, we see from the triangle 
inequality that (|17p will follow if we can show 

(18) \\S N *(<Pk, ■ • ■ , <Pk) - S N ,*(tp kl fK)\\mz P ) < e/20. 

We now fix 771 = 771(e) so that the above argument is valid. From (fl4|) (and the 
monotonicity of the Mj) we have 

ipj(v + n) = ipj(v) + O 

for all k < j < K and n e [N] U [N'] . By the bounds on * (and the fact that the 
ifj have magnitude 0(1)) we conclude that 

#(<£fe, . . . , <y9x)(7J + n) = *(<^fc, 

= 

averaging in 71, we obtain 

SN^iifik, ■ ■ ■ , </?k), S N >ty{ip k , ■ ■ • , <^iY) 

Thus we can bound the left-hand side of (jTSJ) by O e ^^ M ^ . If we then choose 

F to grow sufficiently quickly depending on F and e we obtain the desired claim 
(setting M* := Af^). This concludes the proof of Theorem 15.21 and hence the 
d = 1 case of Theorem 14.11 

6. The inductive case 

To complete the proof of Theorem 14. II (and thus Theorem 1 1.1)1 it remains to verify 
the inductive step of Theorem 14. 1[ i.e. to deduce Theorem 14. II for some fixed d > 1 
assuming inductively that this theorem has already been established for all smaller 
values of d. Fortunately it turns out that the arguments of the preceding section 
extend without much difficulty to handle this case. The one twist will be that 
the basic anti-uniform functions will have higher complexity (they are averages of 
complexity d — 1), and in particular will not obey the simple Lipschitz property 
(|14[) ; however, they will be approximable by functions of complexity d — 1 or less 
and will thus be treatable by the induction hypothesij^. 

Before we begin the rigorous argument, let us give an informal discussion to try to 
motivate the strategy of proof. For simplicity let us just discuss the case d = 2 and 



lu In ergodic theory terminology, the complexity d case (with J = 1) is being viewed as a kind 
of "weakly mixing extension" of the complexity d — 1 case (with J > 1), with the latter serving 
as a kind of "characteristic factor" for the former. Similarly, the J > 1 case at a given complexity 
is a kind of "finite rank extension" of the J = 1 case. 




,<PK)(v) + O t 



( F{M) 
\F(M k ^)J 1 



^{ip k ,...,ip K ) + O t 



( F{M) 
\F(Mk-i) 
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/ = 3, with X equal to a point, and consider the convergence of averages Ajv(/), 
where / has complexity at most (2, 1), and specifically / takes the form 

f{vi,V2,v 3 ) = g{i,2}(vi,v 2 )g{ 2 ,3}{v2,v 3 )g {3 . 1} (v 3 ,vi) 

for some functional 3{i,2} ! fl , {2,3})fl l {3 ) i} ■ — * [ — 1' Then the average Ajv(/) 
can be written explicitly as 

Ajv(/)(wi,f2) = E ne[ jv]ff{l,2}(«l, W2)5{2,3}(«2, ~Ul ~ «2 ~ ^)ff{34}(~ u l ~ w 2 ~ », «l)- 

The <7{i,2}( w i> ^2) factor comes out of the average (cf. (flu]) ) and is therefore unin- 
teresting. We shall thus assume <?{i,2} = 1 an d so 

(19) A N (f)(vi,v 2 ) = E ne[7V ]5{2 : 3}(w2, -v\ -V2- n)g {3A} (~vi - v 2 - n,v{). 

Now suppose that we are in the "compact" or "finite rank" case in which <?{2,3} and 
5{3,i} were actually complexity 1 objects, for instance suppose we had 

g{2, 3 }(v2,v 3 ) = h 2 (v 2 )h 3 (v 3 ) and ff{3,i}(«3,«i) = ks(v 3 )ki(vi) 

for some functions h 2 ,h 3 ,k 3l k\ : Tip — ► [—1,1]. Then the average simplifies to 

Ajv(/)(«i,U2) = ^2(^2)^1 (ui)E ne [ W ]/i 3 A;3(-Wi - v 2 - n). 

The convergence of this average can then be easily deduced from the d = 1 theory of 
the previous section. Similarly we expect to be able to handle the case when ^{2,3} 
and ff{3,i} are of complexity (1, J) for some bounded </, i.e. they are a bounded 
combination of tensor products of functions of one variable. 

Now let us consider the opposing case in which #{2.3} (say) does not behave at all 
like a tensor product of one variable, so much so that they behave "orthogonally" to 
any such tensor products. A little more precisely, let us suppose that correlations 
of the form 

(20) E„ 3(EW2+ +[N']g{2, 3 }(v2,v 3 )h 2 (v 2 )h 3 (v 3 ) 

are always small for "generic" base points w 2l w 3 G Zp and arbitrary bounded 
functions h 2 ,h 3 : Zp — > [—1,1] (we will not attempt to make these assertions 
rigorous here), and for various values of TV' which we shall leave vague here. In 
that "weakly mixing" case, it turns out that the averages Ajy(/) are in fact quite 
small in norm. To see this, let us write 

II A Jv(/)IIl 2 = ^2 Y A N{.f){vi,v 2 )F, ne[N] g {2t3} (v 2 ,-v 1 -v 2 -n)g {3A} (-v 1 -v 2 -n 7 v 1 ) 

v 1 ,v 2 £Z P 

and then rewrite the right-hand side as 

Jjp2 Y Y g{2, 3 }{v2,v 3 )A N (f){v 1 ,v 2 )g {3A} {v 3} v 1 ). 

But observe that for any fixed v\ , the inner sum is (up to some normalising factors) 
the correlation between g{ 2<3 }(y 2 ,v 3 ) and a tensor product of functions of v 2 and 
v 3 separately. This sum is over a diagonal region {(v 2 ,v 3 ) : — £(«i, v 2 , v 3 ) £ [N]}, 
but we can approximately split this region into squares of length AT' for some N' a 



To be completely consistent with our other notation, we should actually make each of 
9{i,2} 1 9{2,3} ) 9{3,i} equal to a function on Zp which is constant in one of the variables vi,V2,vs, 
but we will not do so here to simplify the formulas slightly. 
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bit smaller than N and use the smallness of ([20| to then conclude that Ajv(/) is 
small in L 2 . 



To summarise so far, we have given heuristics to justify some sort of convergence 
in the extreme cases when both 5(2,3} and 5(3. 1} are "compact", and when at least 
one of 5{2,3} and <7{3,i} are "weakly mixing". The rest of the proof then hinges on 
a Koopman-von Neumann type structure theorem (as in the previous section) that 
allows us to split arbitrary functions into compact and weakly mixing components, 
allowing us to deduce the general case from these two special cases. 

We turn to the details. Fix d > 1, and assume inductively that Theorem 14.11 has 
already been established for all smaller values of d. We allow all implied constants 
to depend on I and d. By increasing F if necessary we may assume that F(M) > M 
for all M. 



We can first repeat several of the reductions already employed in the previous 
section. For instance, we can quickly reduce to the case M* = J = 1 by using 
exactly the same arguments used in the d = 1 case. Similarly, by using Theorem 
I A. 21 as before we may reduce X to a point. If we write g = Ilecii l+l}-\e\=d 9 e > 
where g e : Z^ 1 — > [—1,1] is e-measurable, then as before the contribution of those 
e for which e C {1, . . . ,1} can be absorbed using the module identity (TIT)]) . Our 
task is now to establish the following. 



Theorem 6.1 (Finitary norm convergence, inductive step). Let 1 < d < I, and 
suppose that Theorem \4--l\ has already been established for smaller values of d. Let 
I be the collection of all subsets e of {1, ... ,1 + 1} such that |e| = d and I + 1 G e. 
Let F : N — > N be a function, and let e > 0. Then there exists an integer M* > 1 
with the following property: If P > 1, and g e : Zp^ 1 — > [—1, 1] is e-measurable for 
all e s I, then there exists an integer 1 < M < M* such that 



(21) 

for all M <N,N'< F(M). 



Veel / Veel 



< £ 



As in the previous section, a key concept will be that of an anti-uniform function, 
although now this function will be adapted to the index set e. 

Definition 6.2 (Basic e-anti-uniform function). Let M > 1, and let e 6 I. A basic 
e-anti-uniform function of scale M is any function ip e : Zip 1 — > R of the form 

ip e (v) := E. me[M] Y[ h (v e \ {l} ,Y,(v e ) + m) 

where for each i e e, bj : x Zp — > [—1,1] is a function, and for each v = 

(vi,...,vi+i) 6 Z^ 1 , v e := {vj)je e 6 Zp and v e \ {i y := (vj) jBe \{n G are 
projections of u. 
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Observe that this definition generalises Definition 15.41 which considered the case 
I = and e = {1}. Also note that any basic e-anti- uniform function ip e of scale M 
is going to be e-measurable and take values in [—1, 1]. 

Example 6.3. If I = 2 and e = {1,2}, and 61,62 : Zp — > [—1, 1], then any function 
of the form 

ipe(vi,v 2 ,v 3 ) = E me [ M ]6i(w 2 , vi +v 2 + to)6 2 (ui, vi +v 2 + m) 
is a basic e-anti-uniform of scale M . 



We have a generalisation of Lemma 15.51 

Lemma 6.4 (Lack of uniformity implies correlation with basic anti-uniform func- 
tion). Let M > 1 and e > 0. For each e G I, let g e : Zp — > [—1,1] be an 
e-measurable function, and suppose that 



(22) 



Av ^n>) 



> £ 



/or some AT > Then for every eo £ I, i/iere exists a basic e$- anti-uniform 

function ip eQ such that \(g eo , Ve ) i J 2( Z l + 1 ) \ > £ 2 /2. 



Proof. From Definition I3.17i we have 



an (n^j («i,..-,«o := ^ n^w 

Veel / u !+1 :-S(?))e[Ar] eez 



where v = («i, . . . , w;+i). Squaring (f2"2")l . we obtain 



E A N (l[g e )(v 1 ,...,v l ) E l[ge(v)>e 2 NP l ; 

(vi,...,»,)£Z' VeGX / t)j +1 :-E(«)€[Ar]eeI 



if we then let h : Zp — > [—1,1] be the function 



1) : = a ~ (n^ ( 



we then obtain 

E h(v)l[g e (v)>s 2 NP l . 

.iezw : -E(i;)e[iV] e ei 

Observe that for each e G Z\{eo}, g e will be {1, . . . , 1+ l}\{i}-measurable for some 
i G eo. The function h obeys the same property; indeed, h is clearly {1, . . . , I + 
1}\{1 + Immeasurable, and / + 1 lies in eo by definition of X. 

K v )\\_9e{v) = g ea {v)\\_bi (v) 



eel 



i£e 



where hi : Zp^ 1 — > [—1, 1] is a {1, . . . , 1+ l}\{i}-measurable function. Thus we have 



E 9e a (Ve a ) 



E 



n^K^e S )>e 2 AP J 



iEe 
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where eg := {1, . . . , Z+l}\eo, and we abuse notation by identifying the eo-measurablc 
function g eg with a function on Zp. Since eg has cardinality I + 1 — d > 0, we can 
write eg = {j} U / for some j £ {1, ...,/ + 1} and some / C {1, . . . , I + 1} of 
cardinality ^ — d. By the pigeonhole principle, we may thus find Vf £ Z p such that 

». e2? ^ez^ : _ S (^ o )_ S (^)_^ 6[J v] i6eo 

Fix this v / . Since AT > , we can shift [N] by m for any to £ [M] and only pick 
up an error of at most e 2 NP d /2, thus 

u eo SZp Q 0j eZ pO : -E(K eo )-S(^)— Uj e[AT]+m l£e ° 

for all to £ [M] . Summing this over all to £ [M] we obtain 

E E 3e>) E n > ^m^a 

ne[N] Veo ez"o , 3 . e z;8:-% )-s{»,)-^en+[M] lGe ° 

By the pigeonhole principle we may thus find n £ [iV] such that 

E »«oW E ]Jb i (v eo ,v j ,v f )>e 2 MP d /2. 

"«o £Z p ^ eZp^-s^j-s^)-^ en+[M] * ee ° 

If we define &i : Zp x Zp — > [—1,1] to be the function 

b l {v eo ,w) := &j(« eo ,-E(u/) n,«/) 

then we have 

E ^o(« eo ) E H~k(Ze ,X(ve )+m)>e 2 MP d /2, 

v eo EZ e j? m£[M]iee 

or in other words 

E ^ eZj?S , e (We )E me [M] ]J ^«> S ( u e ) + "l) > £ 2 / 2 - 
i6e 

If we now add some dummy variables Vk for all k £ eg, we obtain the claim. □ 



10 \x\ ~ 

Now let K > 1 be the first integer larger than — ^ — h 2, and f : N ^ N be a 
function to be chosen later (it shall depend on F and e), such that F(M) > M for 
all M. Once again, we define the sequence 

1 < Mi < M 2 < . . . < M K 

recursively by Mi := 1 and Mj+i := F(Mi). By running the proof of Proposition 
15.61 "in parallel" for each of the g e simultaneously, we now show 

Proposition 6.5 (Koopman-von Neumann type theorem). For each e £ X, let 
g e : Zp* -1 — > [0, 1] be an e -measurable function. Then there exists 2 < k < K+l and 
decompositions g e = g e .u^- + 9e,u for all e £ I, where g e jj^i9e.u ■ 'ZJ'p 1 — > [— 1, 1] 
are e-measurable functions with the following properties. 
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(i) (9e,u^ anti-uniform) For each e G X, there exists a basic e- anti-uniform 
function tp e j of scale Mj for each k < j < K such that g e jj± is 34,>fc- 
measurable, where y e< >k ■= 34 2 /(400| J| 2 ) (Ve.fe) V ... V 34 2 /(400|z| 2 ) (fe,K)- 

(ii) (g e ,u uniform) For any e£l, we have 

(23) \\A N (g e ,u Yl MIL» ( z«0 ^ jKm 

e'el\{e} 1 1 

for all N > 100 °l :r l 2 M k-i an g a n e / - measura f}l e h e , ■ Zp^ 1 — » [—1,1] /or 
e'eT\{e}. 

Remark 6.6. This result is a "weak hypergraph regularity lemma", akin to the 
"weak regularity lemma" of Frieze and Kannan |5j . One can also develop stronger 
regularity lemmas (in which one obtains local regularity and not just global regu- 
larity), similar for instance to those in [53], by replacing the "single-loop" greedy 
algorithm argument presented here by a "double-loop" one, but they will not be 
necessary for our purposes here. 



Proof. The argument shall closely follow the proof of Proposition [SHI We perform 
the following algorithm: 



• Step 0. Initialise k = K + 1. 

• Step 1. For each e G X, set y e , >k := 34 2 /(4oo|z| 2 )(<y?e,fc) v - . .V34 2 /(4oo|J| 2 )(^e,if ) ; 
g e ,u^ : = E(3e|^ e ,> fc ) and g e .u := g e ~ g e ,u^- 

• Step 2. If (J23J holds for all N > 1000|T ^ Mfc - 1 , all eel, and all e'- 
measurable h e > : Z l p 1 — > [—1, 1] then STOP. Otherwise, we apply Lemma 
16.41 to locate ane£l and a basic e-anti-uniform function (p e ^_i and scale 
M/c-i such that | {g e .u, <^ e ,fc-i) i 2( Z i + 1 ) — e 2 /(200|Z| 2 ). For all the e' in 
T that are not equal to e, we set i/v,fc-l to be an arbitrary basic e'-anti- 
uniform function of scale Mfc_i (e.g. one could set ^v,fe-i := !)• 

• Step 3. We decrement fc to fc — 1. If k = 1 then we STOP with an error; 
otherwise we return to Step 1. 



Once again, we are done if this algorithm terminates at some k > 2, so suppose 
instead for contradiction that the algorithm goes all the way down to k = 1 . Then, 
by construction, we have constructed <p e j for e G X and 1 < j < K , with the 
property that for every 1 < j < K there exists e G X such that 

By arguing exactly as in the proof of Proposition 15.61 we then conclude that 

HEGfelye^H^zH-l) > M9e\ye,> 3 + l)\\l Hzr) + 

for this value of e. On the other hand, from Pythagoras' theorem we have 

\\ E '{ge'\ye',>j)\\ L 2( Z l + 1 ) - \\E(9e>\y e >,>j+l)\\ L 2( Z l+^ 
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for all other values of e' G T. Thus if we define 

C 3 : = ll E (^3^>j)||£2 (Z i+l) 

e'ez 

then we have 

Cj - Cj+1 + io 6 \x\f 

On the other hand, Cj varies between and \X\. This contradicts the choice of K, 
and Proposition 16.51 follows. □ 



We apply this proposition to obtain 2 < k < K, basic e-anti-uniform functions ip e j 
for e£l and k < j < K , and decompositions g e = g e jji- + g e ,u with the stated 
properties. 

Let be the first integer greater than 100 °l I ^ ilJt - 1 ) anc [ j e t Af ** be the first 
integer such that F(M**) > M^ 4, (so in particular M** < + 1). Thus 



1 < M fe _i < M„ < M** < M k < . . . < M 



K- 



To prove Theorem 16 . 1 1 (with M* := Mr:), it will suffice to show that, for F chosen 
appropriately depending on F and e, that there exists M** < M < M** such that 



(24) 



(n>) - (n< 



< £ 



i 2 (Z^) 



for all M < N,N' < F(M). Note that since M k = F{M k ^ x ), we can make M** 
larger than any specified function of M* by choosing F to be sufficiently rapidly 
growing. 

Let us enumerate I arbitrarily as I = {ei, . . . , em}. From (|23p we have 



i<i'<3 



|X|j 

u<j'<ffl 



< e/(10|X|) 



L 2 (Z^) 

for all 1 < j < \I\ and all N > M*. From the standard telescoping identity 

II - II ' = YL v II • < 



3=1 3=1 



3 = 1 



>1<3'<3 



n ^ 

u"<3"'<m 



and the triangle inequality, we conclude that 



An - An (j[9e,U^ 



< e/10. 



L 2 (Z l P ) 



By the triangle inequality again, we see that to show (pM)) , it suffices to find Af** < 
M < M** such that 



(25) 

for all M <N,N'< F(M) 



VeGl / VeGl 



< e/10 
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We have reduced to the "characteristic factor" of anti- uniform functions, and will 
now break these functions up into their basic components. Let rji > be a small 
quantity to be chosen later. By applying Lemma 13.61 precisely as in the preceding 
section, we see that for every e£l there exists a polynomial \& e : Tl K ~ k+1 — > R of 
K — k + 1 variables with degree and coefficients O e , Vl (1) such that 

llffe,^ - *e(y e ,fc, ■ ■ •) < />e,.fsr)|| z ,i( Z i+i) < e 7)i 

and 

\\9e,U± ~ *e(<^e,fc, ■ ■ • J ¥'e,i<r)|| i o= (z i+l ) <C £ 1 

(note that as we are allowing implied constants to depend on I and d, we have 
\X\ = 0(1) and K = O e (l)). Now, for any e-measurable function h e : Z^ 1 — > R 
and any integer n, one can use the e-measurability to check that the function 
(u,U{-l_i) i — * h e ((v, — — n)) is a permutation of h e and thus has the same L 1 
norm. Averaging this in n using Minkowski's inequality we conclude that 

||A/v(/l e )||il( Z ^) < ll /l e|| L i( Z ^i) 

for any e-measurable function h e : ZJ'p 1 — > R, and thus 

\\&N(h e b)\\ L i {z i , < \\h e \\ Ilf7 , l+ i, 



for any e-measurable function h e 
Because of this and many applications of the triangle inequality we see that 



R and any function b : Z 



-1,1]. 



and 



where 
(26) 




A N (h) 



A N (h) 



<e 1 



In particular, we have 





I -A N (h) 


Veex J 





1/2 



Similarly for N replaced by N' . Thus if we choose r)i sufficiently small depending 
on e, we see from the triangle inequality that to show (|25[) it suffices to show that 
there exists M** < M < M** such that 

(27) \\A N (h)-A N ,(h)\\ LHZlp) <e/20, 
for all M <N,N'< F(M). 



Henceforth we fix 771 depending on e so that the above reductions hold. 

In principle, the induction hypothesis should now let us conclude the argument. 
Unfortunately, the function h is not quite a function of complexity d — 1, because 
of the localisations to scale Mj present inside the basic e-anti-uniform functions 
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ip e ,j- Fortunately (as in the previous section), these scales are very large, indeed 
Mj > Mk > F(Mk-i), and since we have the freedom to choose F at will, this 
localisation will end up causing no difficulty. 

We turn to the details. It will be convenient to localise the spatial variable to the 

1/2 

scale L := \_M k J; note that this scale is intermediate between the coarse scales 
Mfe, . . . , Mk and the fine scales M**, M** . We can rewrite the left-hand side of 
(|27| as 

,we[L]> \^N(h)(v + w) — Ajv'(/i)(f + w)\ J 

which we expand a little further using Definition 13.171 as 
(28) 

(^vez'p^elLY | E r>e[w]M w + w , -^(v + w) - ri) - *E ne{N ,}h{v + w, -E(v + w) - n)\ 

We can approximate h as an average of complexity d — 1 functions: 

Lemma 6.7 (h essentially has complexity d — 1). For v G 7} p , w G [L] 1 , and 
n G [-/V] U [N'], we can write 

h(v + w, -E(v + w) -ri) = E ?aeM </Vm(w, -E(w) - ri) + O e (M k ), 

where M is a finite set, and for each rfi G M, (f v .,% : Z' +1 x Z — > R is an elementary 
function of complexity at most (d — 1, O e (l)). 

Remark 6.8. The parameter m G M shall play a "passive" role and will eventually 
be absorbed into a probability space X when we apply the induction hypothesis. 



Proof. From (|26p we know that h(v + w, — £(v+u>) — n) is a polynomial combination 
of the quantities f e ,j(v + w, — E(u + w) — ri). On the other hand, from Definition 
16.21 we can write 

ip e ,j(v+w, -£(u+w)-n) := E mje [ Mj ] & e ,ij((u s +w s -n5 s ,i+i) see \{i}. X! Ws +5Z Ws " I1 + m J') 

for all w = («i, . . . ,«;+i) G Zp, u; = (iui, . . . ,Wi), and n G [iV] U [iV'], where 
we adopt the conventions that := —£(«;), := — £(u>), and <^s,z+i is the 

Kronecker delta, equal to 1 when s = I + 1 and otherwise. 

Now since iV, JV' < F(M) and M < M* we see that N, N' < M X k iA '. Thus we see 

that X^sGe — ^ = 

0{M k > ). On the other hand, M, > M fc . Thus we can shift m 
by X^see w s — n and write 

Ve,i(w + w, -E(u + w) - n) = E m3e[Mj] -0 e j>, m3 (u', -S(w) -n) + 0(M fc ) 

where i\)e,i,v,m,j '■ Z i+1 — » [—1,1] is the function 

From Definition 13.121 we observe that ipe,j,v,mj is a basic function of complexity at 
most d — 1. 
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Applying ([26]) (and recalling that ^ e has degree and coefficients £ (1)), we can 
now write 

h(v+w,-E(v+w)-n) = (( E m J e[A/ J ]V'e J , t) ,m J (w,-S( w ) -n))^ fe )+O e (M~ 1/2 ). 

eel 

We expand out the polynomials and collect all the rrij averages, and eventually 
rewrite the right-hand side in the form 

EmeM ^2 c a ^ v> ^, >a (w, -S(u?) - n) + £ (M~ 1/2 ), 

where M is a finite index set (it is the product of finitely many intervals of the form 
[My]), A is another index set of size O e (l), the coefficients c a are numbers of size 
£ (1), and the $ Vt rft, a ■ Z l+1 — > [—1,1] are various basic functions of complexity 
at most d — 1 whose exact form is not of importance to us (they are products of 
various 'ip e ,j,v,mjj where the rrij are drawn from components of the to). If we then 
define 

we obtain the claim. □ 
From this lemma and ([28]) . we can now bound the left-hand side of ([27]) by 

(2\ *^/^ 1/4 
Ei-ez'p.^elL]' | E ™e M ( E „ e [v]<^m(w, -E(u>) - n) - E„ e[Ar ^„ iA (w, -S(w) - n))\ J +0 £ (M fc ~ ) 

which by Cauchy-Schwarz can be bounded by 

(2\ ^/^ 1/4 
E^ez^^ieM^eii]' |E ne[A r]</VmK -S(tu) - n) - E n elN']<Pv,m{ w : -£(«>) -n)| J +O e {M~ ). 

The next step is to move from [L]' to a cyclic group. Let V be the finite set Z l p x M, 
which wc endow with the uniform measure [iy . Let Q := (I + 1)L. We define the 
functions <p : Z 1 ^ 1 x Y — ► [— 1, 1] by defining 

. . .,wi,wi +1 ), (v,m)) := fv^Wi, . . .,wi,wi +1 ) 

when v £ Z P , to € M, u>i , . . . , Wi £ [L] and 6 { — 1, ■ ■ ■ , —Q} (where we 

identify integers with elements of Zq in the usual manner), and <p v m — otherwise. 
Note that as (p V rA '■ Z l+1 — > R is an elementary function of complexity at most 
(d - 1, £ (1)), the function (p : Z'q 1 x X — ► R is also. 

Since \Q l \ < |[L] Z |, one can bound the left-hand side of by 

< ( E v e>>ez$, |E n£[JV] ^((iu,-E(u;) -n),y) -E ne[ jv/]£((u?,-X)(io) -n),y)| J +0 £ (M^ ), 
which by Definition 13.171 can be expressed as 

« \\A N <p - AN><p\\ L 2 iz i+i xY) + £ (Af" 1/4 ), 

where we have abused notation slightly and viewed as a function on Zq' 1 x Y 
instead of Zq x Y by adding a dummy variable. But we can now apply the inductive 
hypothesis, Theorem 14.11 with d, P, X, g, e, M*, J replaced by d — 1, Q, F, ip, 
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e/C, M**, and O e (l) respectively for some large absolute constant C, and conclude 
the existence of M*„ < M <C Of,e,c,m„ (1) such that 

\\A N <f~ A N ,(p\\ L2{z i+i xY} < e/C 

for all M < N, N' < F(M). If we choose F to be sufficiently fast-growing depending 
on F, e, C, we can ensure that M** > M. The left-hand side of (|27|) is now bounded 

by 

« £ /c + o £ (m; 1/4 ). 

By making C sufficiently large, and making F sufficiently fast-growing depending 
on e, we thus establish (f27|) . This establishes Theorem l6.1[ and hence (by induction) 
Theorem 14. II Theorem 11.61 and Theorem 11.11 then follow. □ 

Appendix A. A quantitative dominated convergence theorem 

We recall the following version case of the Lebesgue dominated convergence theorem 
on the net N 2 : 

Theorem A.l (Lebesgue dominated convergence theorem for N 2 ). Let (X,X,n) 
be a probability space, and for each n, n' € N let f n>n ' '■ X — ► [0, 1] be a measurable 
function. If, for almost every x G X, we have linin.n/^oo f n . n '(x) — then we have 
lim„ J x fn,n'(x) dfl(x) = 0. 

In this appendix we apply a correspondence principle (essentially the Furstenberg 
correspondence principle) to transfer this infinitary theorem to a finitary counter- 
part, which may be of some independent interest. More precisely, we have 

Theorem A. 2 (Finitary Lebesgue dominated convergence theorem). Suppose we 
have a positive integer M^^p.e assigned to each e > and each function F : N — > N. 
Then for every e' > and every F' : N — ► N we can find a positive integer F , e , 
with the following property: given any probability space (X,X,/i), and sequence 
f n>n i : X — ► [0, 1] of measurable functions with the quantitative convergence property 

(*) For every e > and every F : N — > N, for almost every i£l there exists 
an integer 1 < M < M* f,c such that f n ,n'{ x ) < e f or a ^ M < n,n' < 
F(M). 

there exists an integer 1 < M < M'^ F , , such that 




for all M <n,n' < F'{M). 

Remark A. 3. In this theorem, the indices n, n' are ranging over the net N 2 , but it 
will be clear from the proof that one could in fact work with any countable net. 

Proof. Let us fix the assignment (e,F) ^ M* t p tE , as well as the quantity e' > 
and the function F' > 0. We may assume that F'(M) > M for all M since the 
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claim is vacuous otherwise. Suppose for contradiction that the theorem failed for 
these parameters. Untangling all the quantifiers carefully (and using the axiom 
of choice), this means that for every integer to we can find a probability space 
{XW,X( m \yW) and a family fffi : -> [0,1] of sequences obeying the 

property (*), but such that 

(29) sup / f$d^>e 

M<n,n'<F'(M) J X( m ~> 

for all 1 < M < m. 

Let [0, 1] N be the space of all functions g : N x N — > [0, 1] ; by Tychonoff 's theorem, 
this is a compact Hausdorff topological space with the product topology and the 
usual Borel cr-algebra B, which is countably generated. 

Observe that we have the maps 

j(m) . X (m) [ 0j j_]N for each m > 1 defined by 

f^\x){n,n') :=f$(x). 

One easily verifies that this map is measurable. Thus, we can push forward the 
probability measure (j,( m > forward by to create a probability measure :— 
/i m V (m) on [0,1] n2 . The space of probability measures on the countably generated 
a- algebra B is weakly sequentially compact. What this means is that we can find 
a subsequence v^ m ^ of the probability measures which converge weakly to 
another probability measure v on [0,1] N in the sense that 

(30) lim v {jrl]S} {A) = v(A) 

for any elementary set A. Indeed, for each elementary set A one can refine the 
subsequence rrij so that i/ m *)(A) is convergent, and then by the usual Arzela- 
Ascoli type diagonalisation argument we can ensure that v^ mi \A) converges to a 
limit v{A) for all elementary sets A. One can then use the Caratheodory extension 
theorem or Kolmogorov extension theorem to extend v to a probability measure. 

Fix this subsequence rrij and the limit measure v. For any natural numbers n, n' s 
N, let TT n ,n' '■ [0, 1] n2 — > [0,1] be the coordinate projection n n , n <(g) := g(n,n'). 
These functions are continuous on [0, 1] N and hence measurable; indeed we see 
that n'([ a 'b]) is an elementary set for any interval [a, b) with rational endpoints. 
From ((29)) and the definition of and 7r n>n ' we see that 

SUp / 7T"n,n' 

(y) dv { - m \y) > e' 

M<n,n'<F'(M) J[0,1] n2 

for all 1 < M < to. Fixing M, specialising m to rrij for j sufficiently large, and 
then taking limits as j — s- oo using the weak convergence of the (noting that 

the level sets of 7r„ — 7r n / are elementary sets), we conclude that 

sup / TT n ,n'(y) dv(y) > e' 

M<n,n'<F'(M) J[0,1] n2 

for all M > 1. Observe that the function (n, n') i— > TT n , n '(y) is (tautologically) a 
pseudometric taking values in [0, 1] for each y £ [0, 1] N . Applying Theorem lA.il in 
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the contrapositive, we conclude that 

v{{y G [0, 1] N : inf Slip ^n,n' 

(y) > 0}) > 0. 

Af->-oo „, t n'>M 

By countable subadditivitjlH, this implies that there exists an e > such that 
v({y G [0, 1] n2 : sup 7r„ >n /(y) > 2s for all M > 1}) > 0. 

n,n'>M 

If we define the sets 

E M ,M' : = {V e [°i !] N2 : SU P nn,n>(y) > 2e} 

M<n,n' <M' 

for all 1 < M < M', we thus see that 

OO OO 

Kfl U ^M.Af')>0. 

M=l M'=Af 

By using countable subadditivity recursively, we can thus find an integer F(M) > 
M associated to every M > 1 such that 

Mq OO OO 

f| E M ,F(M) n (J U E M,M') > 

M—l M=M + 1M'=M 

for all Mq > 1, and in particular that 

(31) v{{y G [0, 1] n2 : inf sup K n ,n>{v) > 2e}) > 

1<M<M M<n,n'<F(M) 

for all M > 1. 

Fix this F. We apply hypothesis (*) for the sequences /i™/ (as) and conclude that 
for all j and //"^-almost every x G X^ m ^ we have 

KM<M,,p, e M<n,n'<F(M) 

Equivalently, by the definition of u^ mi ' and ir n we have 

i/< m '>({l/ G [0, If 2 : inf sup 7r n ,„'(l/) < e}) = 1. 

l<MS-M»,F,e M<n,n'<F(M) 

Since z/ m ^ converges weakly to v, and the subset of [0, 1] N appearing above is 
compact and depends on only finitely many coordinates of [0, 1] N , we conclude 
that 

v{{y G [0,1] n2 : inf sup 7r n , n /(y) < e}) = 1. 

1<-M<AZ»,f,s M<n,n'<F{M) 

But this contradicts (|3"Tj) . The proof of Theorem IA. 21 is complete. □ 

Remark A. 4. In principle, the quantity F , e , can be explicitly computed from 
F', e', and the assigment (F,e) i-» M*,^. In practice, though, it seems remark- 
ably hard to do; the proof of the Lebesgue dominated convergence theorem, if 
inspected carefully, relies implicitly on the infinite pigeonhole principle, which is 
notoriously hard to finitise. Indeed the situation here is somewhat reminiscent of 



12 This can be viewed as an infinite version of the pigeonhole principle, viz. if a set of positive 
measure is covered by countably many measurable sets, then at least one of those sets also has 
positive measure. 
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that of the Paris-Harrington theorem [18] . Note that it was established in [29] (see 
also [20]) that the Lebesgue dominated convergence theorem is equivalent in the 
reverse mathematics sense to the arithmetic comprehension axiom (AC A), which 
does strongly suggest that the dependence of F , s , on the above parameters is 
likely to be fantastically poor. 

We will use Theorem lA.2l to eliminate the role of various probability spaces (X, X, fi) 
in our analysis. This elimination is not, strictly speaking, absolutely necessarjO for 
us; we could instead passively carry such spaces with us throughout our arguments, 
at the cost of making the notation in those arguments slightly more complicated. 
We have however chosen this approach to highlight the Unitary version of the dom- 
inated convergence theorem, which is not so well-known in the literature. 
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